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ABSTRACT 

We give a probabilistic interpretation of sampling theory of graph 
signals. To do this, we first define a generative model for the data 
using a pairwise Gaussian random field (GRF) which depends on 
the graph. We show that, under certain conditions, reconstructing a 
graph signal from a subset of its samples by least squares is equiva¬ 
lent to performing MAP inference on an approximation of this GRF 
which has a low rank covariance matrix. We then show that a sam¬ 
pling set of given size with the largest associated cut-off frequency, 
which is optimal from a sampling theoretic point of view, minimizes 
the worst case predictive covariance of the MAP estimate on the 
GRF. This interpretation also gives an intuitive explanation for the 
superior performance of the sampling theoretic approach to active 
semi-supervised classification. 

Index Terms — Graph Signal Processing, Sampling theorem, 
Gaussian Markov random field. Semi-supervised learning. Active 
learning 

1. INTRODUCTION 

Graph signal processing aims to extend the tools for analysis, ap¬ 
proximation, denoising and interpolation of traditional signals to sig¬ 
nals defined on graphs. The advantage of this framework is that it 
allows us to process the given data while taking into consideration 
the underlying connectivity between the data points. The graph can 
be inherent to the data as is the case in application areas such as so¬ 
cial networks and sensor networks or it can be constructed using the 
data to capture the underlying geometry. Examples of the latter are 
found in image processing and machine learning (see dill). 

In this paper, we focus on the sampling theory of graph signals. 
The classical Nyquist-Shannon sampling theorem says that a signal 
with bandwidth / is uniquely determined hy its (uniformly spaced) 
samples if the sampling rate is higher than 2/. Intuitively, it tells 
us how “smooth” the signal has to be, for perfect recovery, given 
the sampling density, and vice versa. Moreover, the signal can be 
perfectly reconstructed from the samples by a simple low pass filter. 
Sampling theory of graph signals similarly deals with the problem of 
reconstructing an unknown graph signal from its samples on a subset 
of nodes. Frequency domain representation of graph signals is given 
by the eigenvectors and eigenvalues of the Laplacian matrix associ¬ 
ated with the graph. In order to pose a sampling theorem analogous 
to the Nyquist-Shannon sampling theorem, we need to find the max¬ 
imum bandwidth (in the graph spectral domain) that a graph signal 
can have so that it is uniquely determined by its samples on the given 
subset of nodes. Conversely, given the bandwidth, we need to find 
the smallest subset of nodes, so that recovery of any signal with that 
bandwidth, from its samples on that subset, is unique and stable. 
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Given that the signal is smooth enough to be uniquely represented 
by its samples on a subset of nodes, we need to give an efficient and 
stable algorithm to reconstruct the unknown samples. These ques¬ 
tions have been answered to some extent in (3] 13121H). We discuss 
some of these results in Section lT^ 

This sampling theoretic perspective has been shown to be very 
useful for graph based active semi-supervised learning |I7|. In this 
context, label prediction is considered as a graph signal reconstruc¬ 
tion problem. The characterization of a subset of nodes given by the 
sampling theory, namely the associated cutoff frequency is used as a 
criterion function to choose the optimal set nodes to be labelled for 
active learning. 

Sampling theoretic approaches for active and semi-supervised 
learning Q are purely deterministic. However, their probabilistic 
interpretation is desired for the following reasons: 1. It allows us 
to understand them as model based methods and thus, makes it eas¬ 
ier to include them as components of a larger probabilistic model. 
2. It can also suggest a principled way to refine the model parame¬ 
ters (which are given by the underlying graph) as more data is ob¬ 
served (see O for an example). 3. The interpretation presented in 
this paper assumes a Gaussian random field model for the data. This 
may lead to generalizations of the sampling theory to data with non- 
Gaussian distributions which might be more realistic for a classifi¬ 
cation problem. 4. This interpretation also makes the relationship 
between the sampling theoretic approach and previously proposed 
semi-supervised (9) and active learning IIOIIIII methods more ap¬ 
parent as discussed in Section]^ 

The main contributions of this paper are the following. We de¬ 
fine a generative model for graph signals using a pairwise Gaussian 
random field (GRF) with a covariance matrix that depends on the 
graph. We show that, when conditions of the graph signal sampling 
theorem are satisfied, bandlimited reconstruction of a graph signal 
from a subset of its samples is equivalent to performing MAP in¬ 
ference on a low rank approximation of the above GRF. This learn¬ 
ing model performs very well in classification problems, as demon¬ 
strated in the experiments, since the true data covariance matrix is 
expected to be close to low rank. We then show that a sampling set 
of given size with the largest associated cut-off frequency, which is 
optimal from a sampling theoretic point of view, minimizes the worst 
case predictive covariance of the MAP estimate on the GRF. 

2. SAMPLING THEORY OF GRAPH SIGNALS 
2.1. Preliminaries and Notation 

We consider a connected, undirected and weighted graph G = 
(12, £). The nodes V in the graph are indexed by {1,2,..., N}. 

denotes the complement of 5 in 12, i.e., 5'^ = 12 \ 5. The edge 
set £ is given by Wiy)}, where i,j £ 12 and Wij £ R"*". 



{i,j,Wij) denotes an edge with weight Wij connecting nodes i 
and j. The connectivity information given by £ is encoded by 
the adjacency matrix W of size N x N with W(i, j) = Wij. 
The degree matrix D is a diagonal matrix diagjdi,... (In}, where 
di = Wij is the degree of node i. The Laplacian matrix is 

defined as L = D — W. The symmetric normalized form of 
the Laplacian is given by C — A graph signal 

/ : V —^ ffi is a mapping which takes a real value on each node of 
the graph. It can be represented as f = (fi,. .., fjv)^ G For 
X € R^, X 5 denotes a sub-vector of x consisting of its components 
indexed by <S. Similarly, for A € R^^^, A.S 1 S 2 is the sub-matrix 
of A with rows indexed by 5i and columns indexed by 82 - For 
simplicity, we denote A 55 by As. We use Amax[.] and Amin[.] to 
denote the largest and the smallest eigenvalue of a matrix, respec¬ 
tively. tr(.) denotes the trace of a matrix. A^ is used to denote the 
pseudo-inverse of A. 1 and 0 denote vectors or matrices of ones 
and zeros, respectively. 

It can be shown that L and C are positive semi-definite. Hence, 
L has real eigenvalues 0 = Ai<A 2 <...< \n and a corre¬ 
sponding orthogonal set of eigenvectors {u^, u^,..., u^}. It can 
be diagonalized as L = UAU^, where U = (u^,..., u^) and 
A = diag{Ai,..., Ajv}. Variation in the eigenvectors of L over 
the graph (as captured by u^Lu = . Wij{ui ~ Uj)^) increases 

as the corresponding eigenvalues increase. Thus, these eigenvectors 
allow us a to define a graph dependent notion of frequency for the 
graph signals. The so-called Graph Fourier Transform (GFTQis de¬ 
fined as fi = (f, u*) (or in an equivalent matrix form f = U^f), 
where L is the GFT coefficient corresponding to frequency Ai. An 
tj-bandlimited signal has its GFT supported on [0,t<j], i.e., L = 0 
for Ai > to. Conversely, such a signal is said to have a bandwidth 
equal to w. If {Ai,..., Ar} are the eigenvalues less than uj, then 
any cu-bandlimited signal can be written as a linear combination of 
corresponding eigenvectors 

r 

f = ^aiU* = UvT^a, ( 1 ) 

where a is the coefficient vector. The space of a;-bandlimited signals 
is called a Paley-Wiener space PWu,{G). 

2.2. Sampling Theorem and Bandlimited Reconstruction 

Sampling theory deals with the problem reconstructing an u- 
bandlimited signal f from its samples fs on the nodes in 5 C V. 
There are three important questions that need to be answered in this 
context: 1. Given 5, what is the maximum bandwidth u that f can 
have so that it is uniquely determined by fs? 2. Which is the best 
sampling set 5opt of a given size m? 3. Given that f is uniquely 
determined by fs, how to find the unknown samples fsc? We briefly 
review some of the results related to each of the above problems. 

Let £ 2 ( 8 ’^) be the space of signals which are identically zero on 
8 but can have non-zero samples on 8 ‘^, i.e., gs = 0 Vg G 1 / 2 ( 5 “). 
It is easy to see that for all signals in PH4j (G) to be uniquely deter¬ 
mined by their samples on 8 , we need PWuj{G) n £ 2 ( 8 ’^) = { 0 }. 
This observation leads to the following theorem. 

Theorem 1 (Sampling Theorem j6|). Any signal in PWui{G) can 
be uniquely reconstructed from its samples on a subset of nodes 8 if 
and only if 

u < inf a;(g), ( 2 ) 

seL2(S‘=) 

'The GFT is usually defined using the normalized Laplacian jC. We de¬ 
fine it using L for the sake of notational simplicity. However, most of the 
discussion in the paper can be easily generalized to C. 


where oj{.) denotes the bandwidth of a signal. If the above condition 
is satisfied, then 8 is said to be a uniqueness set for PW^, (G). 

To ensure unique recovery of a signal from its samples on 8, 
its bandwidth has to be less than infggr/ 2 {s=) This is called 

the cut-off frequency associated with the subset 8 and is denoted by 
to(8). An estimate of the cut-off frequency is given by (6) 

Gfc(5) = (A„in [(L'')5 c])'^\ (3) 

It can be shown that Qk{8) < co(8) and we get closer to co(8) as k 
increases. 

A larger cut-off frequency estimate Qk{S) implies that a bigger 
space of signals can be perfectly recovered from their samples on 
8. Therefore, £lk{8) can be used as a criterion function to be max¬ 
imized for choosing the optimal sampling set 5opt of given size m, 
i.e., 

5opt = arg max Q,k{8). (4) 

|»S|=Tn 

The above problem is combinatorial and NP-hard. A greedy algo¬ 
rithm for finding an approximate solution is proposed in (61. 

Consider a signal f € PWu,{G) withai < uj(5). Using the rep¬ 
resentation of a bandlimited signal in 0. we get that fs = Us 7 ^a. 
Since f is uniquely sampled on 8, Ustz must have full column rank 
so that the least squares solution a of the above system of equations 
is unique. The unknown samples can then be reconstructed by: 

fsc =Usc7^(U^7^Us7^)■'uJ^fs. (5) 

A faster, iterative method for bandlimited reconstruction is proposed 
in O, which does not need the computation of eigenvectors. 

These sampling theory based algorithms for subset selection and 
signal reconstruction have been applied to graph based active semi- 
supervised learning and are shown to perform better than many state 
of the art approaches jV]- 

3. GRF MODEL FOR GRAPH SIGNALS 

In order to give a probabilistic interpretation of the graph signal pro¬ 
cessing framework, we define a generative model for the signal using 
a pairwise Gaussian Random Field (GRF) based on the graph G. A 
random signal f = (fi,..., fjv)^ is assumed to be drawn from the 
following distribution: 

p(f) oc exp - fj)^ - 

= exp(-f^(L + M)f) , (6) 

where I denotes an identity matrix of size N x N. Let K be the 
covariance matrix of the the GRF. Then, from the above equation, 
the inverse covariance matrix (also known as the precision matrix) 
can be written as: 

K“^ = L + <51. (7) 

Note that K has the same eigenvectors as L, while the corresponding 
eigenvalues are Oi = ^ . Thus, K can be diagonalized as 

N 

(8) 

where S = diag{(Ti,..., <tjv}. The advantage of introducing the 
parameter <5 is that it leads to a non-singular precision matrix and 
thus, allows us to have a proper covariance matrix, oi — 1/5 can be 
thought of as the variance of the DC component of f since = 1. 




4. SAMPLING THEORY AND INFERENCE OVER GRF 

Consider a signal f generated using the GRF defined in with 
covariance matrix K = (L + (51)“^. As in the sampling problem, 
we observe the samples of f on a subset 5 of nodes. Our goal is to 
estimate the unknown samples. It is well known that the conditional 
distribution of f^c given fs equals A/'(/i 5 c| 5 , K^cjg), where 

= K 5 c 5 (K 5 ) + fs and (9) 

K 5 c |5 = K 5 C — K5cs(K5)’'"K5Sc ( 10 ) 

are the MAP estimate and the predictive covariance matrix of fsc 
given fs, respectively (Him. 

4 . 1 . Bandlimited Reconstruction as MAP Inference 

Let Ar be the largest eigenvalue of L which is less than cu. We define 
K to be a low rank approximation of K which only contains the 
spectral components corresponding to {Ai,..., Ar}, i.e., 

K = V ^ (11) 

^ Ai + 0 


Consider the problem of reconstructing a random signal generated 
using a GRF with covariance K, from its samples on 5. The follow¬ 
ing theorem shows that, if conditions of the sampling theorem are 
satisfied, then the error of bandlimited reconstruction is zero. 

Theorem 2. Let f be a random graph signal generated using the 
GRF with covariance K given by fn). Let fsc be the bandlimited 
reconstruction of obtained from its samples on S, where S is a 
uniqueness set for PW^{G). Then, Ijfsc — f 5 c|| = 0. 

Before proving the above theorem, we show, in the following 
lemma, that bandlimited reconstraction is equivalent to MAP infer¬ 
ence on the GRF with covariance K. 


Proof of Theorem^ From Lemma fgc = Therefore, 

E(|lf5c - fecf) = tr(E(f5c^- A5c|s)(f5=^- = 

tr(K 5 c| 5 ). Now, K^cj^ = K 5 C — Ksc 5 (K 5 ) + K 55 c. Us¬ 
ing the block form of K in ( |12^ , and the fact that U^tz has full 
column rank, it is easy to show that Ks^i^ = 0 , which implies 
E(||fsc — fs=||^) = 0. But since, ||fsc — f^cH > 0, we get 

IIfs” ~ II = 0. □ 


4.2. Cut-off Frequency and Estimation Error 

If the tme covariance matrix is only approximately low rank, then 
MAP inference with K gives a non-zero reconstruction error. The 
best sampling set in this case is the one which minimizes the predic¬ 
tive covariance. According to the sampling theory of graph signals, 
the optimal sampling set of given size is the one which has the largest 
associated cut-off frequency. We show that finding a sampling set S 
which maximizes a crude estimate of the cut-off frequency fli (5) is 
equivalent to minimizing the maximum eigenvalue of the predictive 
covariance of fs^ given fs. 


Proposition 1. Let Sopt — argmax|^|^^ f2i(5). Let K = 
(L -F (5I)“\ Then, Sopt = argmin| 5 |^,.„ A„m.4 K5 c|5 ]. 

Proof. Consider a block matrix representation of K similar to GD- 
Using the block matrix inversion formula, we can write K ^ as 




-(Ks)-iK 




-(K5c)-iksc5Sj; 


where Sk^ = K 5 C — K 5 c 5 (K 5 ) 

Sk^c = K 5 - KJc5(K5c)-^K5cs (14) 


are the Schur complements of K 5 and K^c respectively. Lgc = 
— 5 I 5 C = — 5 I 5 C. Note that Sks = K^cj^. 

Thus, the estimated cut-off frequency corresponding to the subset S 
of nodes can be written in terms of the conditional covariance matrix 


Lemma 1. Let S G V be a uniqueness set for PWut (G). Then the 
MAP estimate ofis’^ given fs in a GRF with covariance matrix K 
is equal to the bandlimited reconstruction given by <0- 

Proof. Under a permutation which groups together nodes in 5"^ and 
5, we can write K as the following block matrix 


Ksc 

K5C5 




K^sc 

Ks 





Therefore, we can write the MAP estimate obtained with covariance 
Kas, 

Asc| 5 = Us=')^5]7^Us7^(Us7^E7^UJ7^)’<'fs. (13) 

Because uj < u)(5), we have that Ustj has full column rank 
and equivalently, has full row rank. Therefore, we can 

write (Us 7 jSkUJ^)+ = (Uj^) + S+U+^ and U+^ = 

Simplifying (13) using these equalities leads 
to 

f5= = U5c7^(U^^U57^)■'UJ7^f5, 
which is equal to the least squares solution given in ([^. □ 


Gi(5) = A„,„[Lsc] = ---- - 5. (15) 

Amax|_iV 5 c| 5 j 

The result readily follows from this. □ 

A sampling set with the largest estimated cut-off frequency 
Gi(5) also minimizes the worst case prediction error of the MAP 
estimate on a GRF with K = (L -F 5I)~^. However, as shown in 
Lemma [T] bandlimited signal reconstruction is equivalent to MAP 
estimation with a low rank approximation of K. Intuitively, a better 
estimate of the predictive covariance, in this model of signal recon¬ 
struction, can be obtained with by with larger values 

of k as it gives more weight to the principal components with larger 
variance. This justifies the use of flk{S) with fc > 1 as a criterion 
for active learning. 

4.3. Justification for the Sampling Theoretic Approach to Active 
Semi-supervised Classification 

MAP estimation is optimal for reconstructing signals generated us¬ 
ing a GRF with a full rank covariance matrix, because it minimizes 
the mean squared error of estimation. Moreover, since the estimation 
error equals tr(K 5 c| 5 ), an optimal sampling set of size m is given by 
argmin|_ 5 |_^ tr(K 5 c| 5 ). Indeed, this is the so-called V-optimality 
criterion for active learning proposed in Go). 

However, in a classification problem, data points in the same 
class are highly correlated whereas data points in different classes 










have very small correlation. Since the number of classes is typically 
very small compared to the number of data points, we expect the (un¬ 
known) “true” covariance matrix to be very well-approximated by a 
low rank matrix El. Thus, bandlimited interpolation is a better 
model for signal reconstruction in this context, since it is equivalent 
to MAP estimation with a low rank covariance matrix. Maximiz¬ 
ing the cut-off frequency is a natural set selection criterion for this 
learning model. 

5. RELATED WORK 

Different criteria have been proposed for batch mode active learning 
on Gaussian random fields. The approach presented in m selects 
the points to label such the mutual information between the labelled 
and unlabelled data points is maximized. Our sampling theoretic 
approach l|^ is more similar to the methods proposed in llOlllll . 
These methods use MAP estimation on GRF 191 as their model for 
label prediction. As stated before, El chooses the sampling set 5 
by minimizing tr(K 5 c| 5 ). The method in ll II . on the other hand, 
tries to minimize (also known as 'E-optimality cri¬ 

terion). This is equivalent to minimizing the risk of the surveying 
problem El (which is the problem of determining the proportion of 
nodes belonging to one class). All the above methods are closely re¬ 
lated to the optimal design of experiments GD- Experiment design 
deals with the problem of estimating a vector from a set of linear 
measurements. The goal is to choose the optimal set of m measure¬ 
ments so that the estimation error is minimized. Different error mea¬ 
sures lead to different optimality criteria. For example, minimizing 
the trace of estimation covariance leads to A-optimal design whereas 
minimizing its determinant gives the D-optimal design. The sam¬ 
pling theoretic approach is closer to the so-called iJ-optimal design 
which minimizes the worst case prediction error given by the maxi¬ 
mum eigenvalue of the predictive covariance matrix. 

6 . EXPERIMENTS 

To demonstrate the effectiveness of the framework of sampling the¬ 
ory, we first apply it to the problem of graph based active semi- 
supervised classification. In our experiment, we use a subset of 
the USPS handwritten digit dataset containing 100 16 x 16 im¬ 
ages each of digits 0 to 9. We construct a weighted Tf-NN graph 
of 1000 nodes with K = 10 and the similarities given by Wij = 
exp ^ ■ The problem is to choose the nodes to be la¬ 

belled and then predict the unknown labels from the queried labels. 
We consider different combinations of active learning criteria and 
learning models. As expected from the discussion in Section [43| 
selecting the sampling set by maximizing the cutoff frequency and 
then performing bandlimited reconstruction outperforms E and V- 
optimality criteria used in conjunction with MAP estimation (see 
Figure[TJa)). Even if the learning model is fixed to bandlimited inter¬ 
polation, the sampling theoretic approach gives better results as seen 
in Eigure[T|b)). This is because maximizing the cutoff frequency is 
a more suitable set selection criterion under this model. 

On the other hand, if we consider the problem of regression of a 
random real valued graph signal generated using a covariance matrix 
that is not low rank, a V -optimal set is expected to give a better SNR 
of reconstruction. This is demonstrated in Eigure where we re¬ 
construct a random real valued signal generated with the covariance 
matrix obtained using the graph from the previous example. 

7. CONCLUSION AND FUTURE WORK 

In this paper, we gave a probabilistic interpretation for the sampling 
theory of graph signals. We showed that if the data is generated using 




Fig. 1: Figure shows the performance of different active learning criteria in 
conjunction with two learning models, namely, (a) MAP (9l and (b) bandlim¬ 
ited reconstruction (BL) 



Fig. 2: Performance in the case of reconstruction of a random real valued 
signal (averaged over 100 trials) 

a Gaussian random field whose precision matrix equals the graph 
Laplacian, then bandlimited reconstruction is equivalent to the MAP 
inference on an approximation of this GRF which has a low rank 
covariance matrix. Moreover, an optimal sampling set obtained via 
sampling theory minimizes the worst case predictive covariance of 
MAP estimation on the GRF. 

A probabilistic interpretation allows us to view graph signal 
sampling theory as a model based method. It would be interesting to 
consider it as part of a larger probabilistic model which refines the 
covariance matrix as more data is observed. This interpretation also 
suggests a generalization of the sampling theory to non-Gaussian 
models which might be more realistic for some applications. 
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